Leveraging recent advances in deep learning for audio-Visual emotion recognition

نویسندگان

چکیده

Emotional expressions are the behaviors that communicate our emotional state or attitude to others. They expressed through verbal and non-verbal communication. Complex human behavior can be understood by studying physical features from multiple modalities; mainly facial, vocal gestures. Recently, spontaneous multi-modal emotion recognition has been extensively studied for analysis. In this paper, we propose a new deep learning-based approach audio-visual recognition. Our leverages recent advances in learning like knowledge distillation high-performing architectures. The feature representations of audio visual modalities fused based on model-level fusion strategy. A recurrent neural network is then used capture temporal dynamics. proposed substantially outperforms state-of-the-art approaches predicting valence RECOLA dataset. Moreover, facial expression extraction results AffectNet Google Facial Expression Comparison datasets.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recent Advances in the Automatic Recognition of Audio-Visual Speech

Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audio-visual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design,...

متن کامل

Audio-Visual Spontaneous Emotion Recognition

Automatic multimodal recognition of spontaneous emotional expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting—the Adult Attachment Interview (AAI). Based on the assumption that facial expression and vocal expression are at the same coarse affective states, positive and negative emotion ...

متن کامل

Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

We propose a transfer deep learning (TDL) framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality. Specifically, we show that we can leverage speech data to fine-tune the network trained for video recognition, given an initial set of audio-video parallel dataset within the same semantics. Our approach first learns the analogyp...

متن کامل

Noise Analysis in Audio-Visual Emotion Recognition

This paper describes the use of a decision-based fusion framework to infer emotion from audiovisual feeds, and investigates the effect of noise on the fusion system. Facial expression features are constructed from linear binary patterns, and are processed independently of the prosodic features. A linear support vector machine is used for the fusion of the two channels. The results show that the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition Letters

سال: 2021

ISSN: ['1872-7344', '0167-8655']

DOI: https://doi.org/10.1016/j.patrec.2021.03.007